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Abstract 

This paper develops a matrix-variate adaptive Markov chain Monte Carlo (MCMC) method- 
ology for Bayesian Cointegrated Vector Auto Regressions (CVAR). We replace the popular- 
approach to sampling Bayesian CVAR models, involving griddy Gibbs, with an automated 
efficient alternative, based on the Adaptive Metropolis algorithm of Roberts and Rosenthal, 
(2009). Developing the adaptive MCMC framework for Bayesian CVAR models allows for 
efficient estimation of posterior parameters in significantly higher dimensional CVAR series 
than previously possible with existing griddy Gibbs samplers. For a n-dimensional CVAR se- 
ries, the matrix-variate posterior is in dimension 3n 2 + n, with significant correlation present 
between the blocks of matrix random variables. Hence, utilizing a griddy Gibbs sampler 
for large n becomes computationally impractical as it involves approximating an n x n full 
conditional posterior using a spline over a high dimensional n x n grid. The adaptive MCMC 
approach is demonstrated to be ideally suited to learning on-line a proposal to reflect the 
posterior correlation structure, therefore improving the computational efficiency of the sam- 
pler. 

We also treat the rank of the CVAR model as a random variable and perform joint 
inference on the rank and model parameters. This is achieved with a Bayesian posterior 
distribution defined over both the rank and the CVAR model parameters, and inference is 
made via Bayes Factor analysis of rank. 

Practically the adaptive sampler also aids in the development of automated Bayesian 
cointegration models for algorithmic trading systems considering instruments made up of 
several assets, such as currency baskets. Previously the literature on financial applications of 
CVAR trading models typically only considers pairs trading (n=2) due to the computational 
cost of the griddy Gibbs. We are able to extend under our adaptive framework to n >> 2 and 
demonstrate an example with n = 10, resulting in a posterior distribution with parameters 
up to dimension 310. By also considering the rank as a random quantity we can ensure our 
resulting trading models are able to adjust to potentially time varying market conditions in 
a coherent statistical framework. 

Keywords: Cointegrated Vector Auto Regression, Adaptive Markov chain Monte Carlo, 
Bayesian Inference, Bayes Factor. 
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1 Introduction 



Bayesian analysis of Cointegrated Vector Auto Regression (CVAR) models has been addressed in 
several papers, see Koop et al. (2006) for an overview. In a Bayesian CVAR model, specification 
of the matrix-variate model parameters priors, to ensure the posterior is not improper, must 
be done with care, see Koop et al. (2006). This has significant implications on the Bayesian 
model structure, in particular one can not make a blind specification of priors on the VAR model 
coefficients as it may result in improper posterior distributions. For this reason it is common 
to consider the Error Correction Model (ECM) framework, see for example p. 141-142 of Reinsel 
and Velu (1998). In this paper we do not aim to address the issue of prior choice or prior 
distortions and we adopt the model of Sugita (2002) and Geweke (1996) which admits desirable 
conjugacy properties. The resulting posterior for a n-dimensional CVAR series, is matrix-variate 
in dimension up to 3n 2 + n for full rank models, with significant correlation present between and 
within the blocks of matrix random variables. This presents a challenge to efficiently sample 
from the posterior distribution when n is large. 

The focus of the paper and novelty introduced involves developing a Bayesian adaptive 
MCMC sampling, based on the proposed algorithm of Roberts and Rosenthal, (2009), to al- 
low us to significantly increase the dimension, n, of the CVAR series that can be estimated. 
Typically in the cointegration literature the sampling approach adopted is a griddy Gibbs sam- 
pling framework, see Bauwens and Lubrano (1996), Bauwens and Giot (1997), Geweke (1996), 
Kleibergen and van Dijk (1994), Sugita (2002) and Sugita (2009). The conjugacy properties of 
the Bayesian model we consider result in exact sampling of two of the matrix-variate random 
variables corresponding to the unknown error covariance matrix and the combined matrix ran- 
dom variable containing the cointegration equilibrium reversion rates a and the mean level fj, 
of the CVAR series. However, the third unknown matrix-variate random variable corresponding 
to the cointegration vectors (3 has a marginal posterior distribution with support in dimension 
n x r. When the cointegration rank r and the dimension of the CVAR series n is large (n > 5) 
then the standard griddy Gibbs based samplers are no longer computationally viable samplers. 
Alternative samplers which may attempt to deconstruct the full conditional distribution of the 
posterior for the cointegration vectors (3 into components of this matrix, updating them one at a 
time will run into significant difficulties with efficiency in the mixing properties of the resulting 
Markov chain. The reason for this is due directly to two factors: the identification normalization 
constraint of the matrix (3; and the strong correlation present in the full conditional posterior 
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distribution for the matrix random variable (3. Hence, utilizing a griddy Gibbs sampler for large 
n becomes computationally impractical as it involves approximating up to an n x n matrix- 
variate full conditional posterior using a spline constructed over a high dimensional space with 
d knot points per dimension, creating a requirement for d n total grid points. The sampler we 
develop overcomes these difficulties utilizing an adaptive MCMC approach. We demonstrate 
that it is ideally suited to learning on-line a proposal to reflect the posterior correlation in the 
matrix-variate random variable, ensuring that updating this n x r matrix at each stage of the 
adaptive MCMC algorithm results in a non-trivial acceptance probability. 

Adaptive MCMC is a new methodology to learn on-line the 'optimal' proposal distribution 
for an MCMC algorithm, see Atachade and Rosenthal (2005), Haario, Salesman and Tamminen 
(2001; 2007) and Andrieu and Moulines (2006) and more recently Giordani and Kohn, (2006) and 
Silva et al, (2009), of which there are several different versions of adaptive MCMC and Particle 
MCMC algorithms. Basically adaptive MCMC algorithms aim to allow the Markov chain to 
adapt the Markov proposal distribution online throughout the simulation in such a way that 
the correct stationary distribution is still preserved, even though the Markov transition kernel 
of the chain is changing throughout the simulation. Clearly, this requires careful constraints on 
the type of adaption mechanism and the adaption rate to ensure that stationarity is preserved 
for the resulting Markov chain. 

To summarize, this paper extends the matrix-variate block Gibbs sampling framework typi- 
cally used in Bayesian Cointegration models by replacing the computational n x n dimensional 
griddy Gibbs sampler with two possible automated alternatives which are based on matrix- 
variate adaptive Metropolis- within- Gibbs samplers. Additionally, we consider rank estimation 
for reduced rank Cointegration models. From a Bayesian perspective we tackle this via Bayes 
Factor (BF) analysis for posterior "model" probabilities of the rank. Then we demonstrate esti- 
mation and predictive performance under a Bayesian setting for both Bayesian Model Selection 
(BMS) and Bayesian Model Averaging (BMA). 

The models and algorithms developed allow for estimation of either the rank r, i.e. the model 
index, and the lag p of the CVAR model jointly with the model parameters. For simplicity we 
shall assume the lag is fixed and known. 

In this paper the following notation will be used: ' denotes transpose, Id is the dx d identity 
matrix, p(.) denotes a density and P(.) a distribution, f2 will be the space on which densities 
will take their support and it will be assumed throughout that we are working with Lebesgue 
measure. The operator (g> denotes the Kronecker product, || ■ || denotes the total variation norm 
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and A denotes the unit vector difference operator. We denote generically the state of a Markov 
chain at time j by random variable and the transition kernel from realized state 0( J_1 ) = 9 
to 0^) by Q (6, 0w). In the case of an adaptive transition kernel we will also assume that there 
is a sequence of transition kernels denoted by Qrj (0,0^), where Tj is the sequence index. 

1.1 Contribution and structure 

In section [2] we present the matrix- variate posterior distribution for the CVAR model formulated 
under an Error Correction Model (ECM) model framework. Next in section [3] we discuss the 
Bayesian CVAR model conditional on knowledge of the co-integration rank. This includes 
discussing and summarizing properties of the Bayesian CVAR model including identification, 
the justification of the ECM framework and issues to consider when selecting matrix priors for 
Bayesian CVAR models with respect to prior distortions. At this stage we make explicit the 
justification for why the Bayesian model decomposes the cointegration matrix IT = a/3' under 
the ECM framework, since working directly with II precludes direct use of Monte Carlo samples 
for inference in the VAR model setting. As pointed out in Geweke (1996) and Sugita (2002), 
conditional on matrix f3 the nonlinear ECM model becomes linear and therefore under the 
informative priors we utilize, we can once again apply standard Bayesian analysis to the VAR 
model, this turns out to be a very useful property widely used in the cointegration literature. 
Then in section 2] we present the two algorithms developed based on Adaptive MCMC to obtain 
samples from the target posterior, followed by section [S] which presents the framework for rank 
estimation we utilize, along with discussion of model selection and model averaging, with respect 
to the unknown rank of the CVAR system. We conclude with both synthetic simulation examples 
with n ranging from 4 to 10, resulting in posteriors defined in dimensions between 52 and 310 
dimensions. We also provide analysis on two real data examples from pairs and triples trading 
typically considered in real world financial algorithmic trading models. 

2 CVAR model under ECM framework 

We note that a well presented representation to co-integration models is provided by Engle and 
Granger (1987), Sugita, (2002), Sugita, (2009) and for the original error correction representation 
of a co-integrated series, see Granger (1981) and Granger and Weiss (1983). The model presented 
in Sugita, (2009) is based on the model of Strachen and van Dijk, (2007) and it generalizes the 
VECM model in Sugita, (2002) to include explicitly the possibility of an intercept and a linear 
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time trend. In this paper we will consider a CVAR model in which we have an intercept term but 
no time trend, the extension to include a time trend is trivial to incorporate into our simulation 
methodology. Based on the definitions of Sugita, (2002) for a co-integrated series, we denote 
the vector observation at time t by Xt- Furthermore, we assume x± is an integrated of order 1, 
1(1), (n x l)-dimensional vector with r linear cointegrating relationships. The error vector at 
time t, et are assumed time independent and zero mean multivariate Gaussian distributed, with 
covariance S. The Error Correction Model (ECM) representation we consider is given by, 



p-i 

Ax t = fi + a/3'xt-i + ^2 ViAxt-i + e t 

i=l 



(2.1) 



where t = p,p + 1, . . . ,T and p is the number of lags. Furthermore, the matrix dimensions are: 
li and et are (n x 1), \Pj and £ are (n x n), a and (3 are (n x r). 

We can now re-express the model in equation (|2.1|) in a multivariate regression format, as 
follows 

Y = XT + Z(3a' + E = WB + E, 



2.2 
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Here, we let t be the number of rows of Y, hence t = T — p + 1, producing X with dimension 
t X (1 + n(p — 1)), r with dimension ((1 + n(p — 1)) X n), W with dimension t x k and B with 
dimension (k xn), where k = 1 + n(p — 1) + r, see Sugita, (2002) for additional details regarding 
this parameterization. The parameters /i represents the trend coefficients, and is the i th 
matrix of autoregressive coefficients and the long run multiplier matrix is given by II = a(3' . 

The long run multiplier matrix is an important quantity of this model, its properties include: 
if II is a zero matrix, the series Xt contains n unit roots; if II has full rank then each univariate 
series in Xt are (trend-)stationary; and co-integration occurs when II is of rank r < n. The 
matrix /3 contains the co-integration vectors, reflecting the stationary long run relationships 
between the univariate series within Xt and the a matrix contains the adjustment parameters, 
specifying the speed of adjustment to equilibria f3'x t . 



This results in a likelihood model, where the parameters of interest are B, E and (3, given 

by 



C(B, E, 0\Y) = (2vr)"°- 5ni |E ® I t |~ ' 5 exp (-0.5Vec(Y - WB)'^' 1 ® I^)Vec(Y - WB)) 



where E = Cov(E) and 

R=(B- B)'W'W(B -B),S = {Y - WB)'{Y - WB),B = {W'W)~ X W'Y 

3 Bayesian CVAR models conditional on Rank (r) 

The assumptions and restrictions of our Bayesian CVAR model include: 

1. Identification Issue: For any non-singular matrix A, the matrix of long run multipliers 
II = a/3' is indistinguishable from II = aAA -1 (3' , see Koop et al. (2006) or Reinsel 
and Velu (1998). We use a standard approach to globally overcome this problem by 
incorporating a non unique identification constraint. We impose r 2 restrictions as follows 
(3 = [Ij.,/31]', where I r denotes the r x r identity matrix. However, as noted by Kleibergen 
and van Dijk (1994) and discussed in Koop et al. (2006) this can still result in local 
identification issues at the point a = 0, when (3 does not enter the model. Hence, one 
must be careful to ensure that the Markov chain generated by the matrix-variate block 
Gibbs sampler is not invalidated by the terminal absorbing state. As is standard we 
monitor the performance of the sampler to ensure this has not occurred. 

2. Error Correction Model: The ECM framework complicates Bayesian analysis since 
products, af3' , preclude direct use of Monte Carlo samples for inference in the VAR model 
setting. However, conditional on (3 the nonlinear ECM model becomes linear and therefore 
under the informative priors used by Geweke (1996) and Sugita (2002), we can once again 
apply standard Bayesian analysis to the VAR model. 

3. Prior Choices: We do not consider the issue of prior distortions illustrated by Kleibergen 
and van Dijk (1994). This is not the focus of the present paper. Alternative prior models 
in the cointegration setting include Jeffrey's priors, Embedding approach and a focus on 
the cointegration space. 
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3.1 Prior and Posterior Model 

Here we present the model for estimation of (3, B and S conditional the rank r. As in Sugita 
(2002), we use a conjugate hierarchical prior. 

• (3 ~ N((3,Q ® H^ 1 ) where N(/3,Q <S> H~ l ) is the matrix-variate Gaussian distribution 
with prior mean /3, Q is an (r x r) positive definite matrix, H an (n x n) matrix. 

• £ ~ /i) where IW (S, h) is the Inverse Wishart distribution with h degrees of freedom 
and S is an (n x n) positive definite matrix. 

• B\T, ~ N(P, £ £x) A" 1 ) where N(P, £ ® A" 1 ) is the matrix-variate Gaussian distribution 
with prior mean P which is k x n and A is a (fc x k) matrix, with k = n(p — 1) + 1 + r 
which corresponds to the number of columns in W. 

Combining the priors and likelihood produce matrix-variate conditional posterior distributions 
(derivation details provided in Sugita, (2002)): 

• Inverse Wishart distribution for p(S \/3, Y) oc |5 + |(^)/2|£|-(t+M-n+i)/2 exp (_o.5tr(S- 1 5^)) 
which is trivial to sample exactly; 

• Matrix-variate Gaussian for p(B\(3, S, Y) oc |^| n / 2 |S|~ fc / 2 exp (-OMr (S -1 (fl - B+)' A+{B - B+))) 
(or alternatively matrix-variate student-t distribution form for p(B\(3,Y)), both trivial to 
sample exactly; 

• The marginal matrix-variate posterior for the cointegration vectors, (3\Y, is not well studied 
and is given by 

P ((3\Y) oc P ((3)\SJ^ t+h+1 ^ 2 \A,\~ n / 2 . (3.1) 

where we define A* = A + WW, B+ = {A + Wwy\AP + W'WB) and S+ = S + S + (P - 
B)'[A- 1 + {WW)- l ]- l {P - B). 

4 Sampling and Estimation Conditional on Rank r 

Here we focus on obtaining samples from the posterior distribution which can be used to obtain 
Bayesian parameter estimates (MMSE, MAP). The complication in sampling arises with the full 
conditional posterior 13. II which can not be sampled from via straight forward inversion sampling. 
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In this paper we outline novel algorithms to sample from the posterior distribution 
p (f3\Y, B,T,,r), providing an alternative automated approach to the griddy Gibbs sampler al- 
gorithm made popular in this Bayesian co- integration setting by Bauwens and Lubrano (1996). 

The matrix-variate griddy Gibbs sampler numerically approximates the target posterior on a 
grid of values and then performs numerical inversion to obtain samples from 13.11 at each stage of 
the MCMC algorithm. Such a grid based procedure will suffer from the curse of dimensionality 
when n is large (n > 5) after which it becomes highly inefficient. Note, alternative approaches 
such as Importance Sampling will also be problematic once n becomes too large. It is difficult to 
optimize the choice of the Importance Sampling distribution which will minimize the variance 
in the importance weights. 

Instead we propose alternative samplers using adaptive matrix-variate MCMC methodology. 
They do not suffer from the curse of dimensionality and are simple to implement and automate. 

• Algorithm 1 - Random Walk (mixture local &: global moves): Involves an offline 
adaptively pretuned mixture proposal containing a combination of local and global Ran- 
dom Walk (RW) moves. The proposal for the local RW moves have standard deviation 
tuned to produce average acceptance probabilities between [0.3, 0.5]. The independent 
global matrix-variate proposal updates all elements of (3 via a multivariate Gaussian pro- 
posal centered on Maximum Likelihood parameter estimates for f3 and the Fisher infor- 
mation matrix for the covariance of the global proposal. This is similar to the approach 
adopted in Vermaak et al. (2004) and Fan et al. (2009). 

• Algorithm 2 - Adaptive Random Walk: Involves an online matrix-variate adaptive 
Metropolis algorithm based on methodology presented in Roberts and Rosenthal (2009). 

Proceeding sections denote the algorithmic 'time' index by j and the current state of a 
Markov chain for generic parameter 9 at time j by The length of the Markov chain is J. 

Note, since we have imposed r 2 restrictions in the form of I r , any proposal for (3 = [/ r , /3] will 
only correspond to the unrestricted elements of (3 denoted by (3. In our case, these correspond 
to those in locations (n — r) x r. 

4.1 Algorithm 1 

In Algorithm 1 the mixture proposal distribution for parameters (3 will be given by, 

(n-r)xr 
i=l 

9 



The Maximum Likelihood parameters are obtained off-line, see (p. 286 Lutkepohl (2007)). The 
local random walk proposal variances af k for each element of j3 are obtained via pre-tuning. 

4.2 Algorithm 2: Adaptive Metropolis within Gibbs sampler moves for CVAR 
model given rank r 

There are several classes of adaptive MCMC algorithms, see Roberts and Rosenthal (2009). 
The distinguishing feature of adaptive MCMC algorithms, compared to standard MCMC, is 
generation of the Markov chain via a sequence of transition kernels. Adaptive algorithms utilize 
a combination of time or state inhomogeneous proposal kernels. Each proposal in the sequence is 
allowed to depend on the past history of the Markov chain generated, resulting in many variants. 

Due to the inhomogeneity of the Markov kernel used in adaptive algorithms, it is particularly 
important to ensure the generated Markov chain is ergodic, with the appropriate stationary 
distribution. Several recent papers proposing theoretical conditions that must be satisfied to 
ensure ergodicity of adaptive algorithms include, Atachade and Rosenthal (2005), Roberts and 
Rosenthal (2009), Haario et al. (2007), Andrieu and Moulines (2006) and Andrieu and Atachade 
(2007). 

Haario et al. (2001) developed an adaptive Metropolis algorithm with proposal covariance 
adapted to the history of the Markov chain. The original proof of ergodicity of the Markov 
chain under such an adaption was overly restrictive. It required a bounded state space and a 
uniformly ergodic Markov chain. 

Roberts and Rosenthal (2009) proved ergodicity of adaptive MCMC under simpler conditions 
known as Diminishing Adaptation and Bounded Convergence. As in Roberts and Rosenthal 
(2009) we assume that each fixed kernel in the sequence Q 7 has stationary distribution P(-). 
Define the convergence time for kernel Q 7 when starting from state 8 as M e (0, 7) = inf{j > 1 : 
HQ7 (0, •) — P (•) || < e}. Under these assumptions, they derive the sufficient conditions; 

• Diminishing Adaptation: \\m n ^ 00 sx\.^>Q ( z E \\QY j+1 (Q, •) — Qv (S, •) II = i n probability. 
Note, Tj are random indices. 

• Bounded Convergence: {M £ (9^,r,,)}£L is bounded in probability, e > 0. 
which guarantee asymptotic convergence in two senses, 

• Asymptotic convergence: limj_> 00 ||,C (G)^)) — P (■) || =0 

• WLLN: lim^ooi £f =1 g (9«) = / g{6)p(6)d6 for all bounded g : E -> M. 
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Algorithm 1: MH within Gibbs sampler for fixed rank r via a pretuned mixture of global 

and local moves. 

Input: Initial Markov chain state (s(°),B(°!,/3 (0) ). 

Output: Markov chain samples {!#), B^) , p(j)} j=1:J ~ p(S,B, (3\Y). 

begin 

1. Set initial state B(°\ fl^) deterministically or by sampling the priors. 



2. Calculate Maximum Likelihood parameters (3 and S 

3. Initialize and W2 = 1 — wi and index j = 1. 

repeat 

5. Sample £ via inversion to obtain T,^\ 

6. Sample B via inversion to obtain B^>. 

7. Sample realization U = u where U ~ U[0, 1] 

if u > W\ then /* perform a local random walk move */ 

7a. Sample uniformly index (i, k) from set of n — r x r elements. 

7b. Sample the (i,k)-th component /3* fc ~ N (Pi^'iP^ ^'^ffcV 

7c. Construct proposal (3* = [I rxr ,P*], where /3* is /3W _1 ) with 
the (i, &)-th element given by 



else 



/* perform a global independent move */ 



7a. Sample proposal /3* ~ iV /3; /3 yu % S 



7b. Construct proposal /3* = [I rxr .,/3*]. 



8. Calculate Metropolis Hastings Acceptance Probability: 



p(S(j),sC?),/3(j- 1 )|y)g (/sc*- 1 ) -> /3*) 
Accept = /3* via rejection using A, otherwise = f3^~ l \ 

9. j = j + 1 



until j = J 



end 
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It is non-trivial to develop adaption schemes which can be verified to satisfy these two 
conditions. We develop a matrix- variate adaptive MCMC methodology in the CVAR setting, 
using a proposal kernel known to satisfy these two ergodicity conditions for unbounded state 
spaces and general classes of target posterior distribution, see Roberts and Rosenthal (2009) for 
details. 

In Algorithm 2 the mixture proposal distribution for parameters (3 which is d = (n — r) x r 
dimensional and is given at iteration j by, 

qj (/3 ( *- X) ,-)=w lN (3^ , ^f- +(1-w 1 )nU (3^ , Qf-lJ\ • (4-3) 

Here, Ej is the current empirical estimate of the covariance between the parameters of (3 esti- 
mated using samples from the Markov chain up to time j. The theoretical motivation for the 
choices of scale factors 2.38, 0.1 and dimension d are all provided in Roberts and Rosenthal 
(2009) and are based on optimality conditions presented in Roberts et al. (1997) and Roberts 
and Rosenthal (2001). The adaptive MCMC Algorithm 2 is identical to Algorithm 1 except we 
replace step 7 with the following alternative; 

Algorithm 2: matrix-variate adaptive MH within Gibbs sampler for fixed rank r. 

if u > W\ then /* perform an adaptive random walk move */ 

7a. Estimate Ej the empirical covariance of (3 for elements in (n — r) x r using samples 

7b. Sample proposal (3* ~ N (j3; (3^,^)1^^ . 
7c. Construct proposal (3* = [I rxr ,(3*]. 
else /* perform a non-adaptive random walk move */ 

7a. Sample proposal (3* ~ N (j3; ^ t ~ 1 ),^^I d> a 
7b. Construct proposal (3* = [I rxr ,f3*]. 
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5 Rank Estimation for Bayesian VAR Cointegration models 

Here we discuss the Bayes Factor approach to rank estimation, noting that it is computationally 
inefficient, since it involves running n+1 Markov chains, one for each model (rank r). For a 
sophisticated alternative which presents a novel TD-MCMC based approach, requiring a single 
Markov chain to obtain samples from the posterior distribution p (B, S, f3, r\Y), see Peters et al. 
(2009). 

5.1 Posterior Model Probabilities for Rank r via Bayes Factors 

In Sugita (2002) and Kleibergen and Paap (2002) the rank is estimated via Bayes factors, a 
popular approach to Bayesian model selection in Bayesian cointegration literature. We note that 
alternative approaches to rank estimation include Strachan and van Dijk (2004) and Strachan 
and van Dijk (2007). Sugita (2002) works with a conjugate prior on a which will not produce 
a problem with Bartlett's paradox, posterior probabilities of the rank are well defined. 

5.1.1 Bayes Factors 

The earlier work of Sugita (2002) compares the rank of the unrestricted en to the rank setting. 
Note, Kleibergen and Pap (2002) have a slightly different approach in that they compared each 
rank r to the full rank case for the unrestricted a parameter. Recently, Sugita (2009) revisits 
the important question of rank estimation via Bayes Factors also comparing the Schwarz BIC 
approximation and Chib's (1995) approach for the marginal likelihood. 

Under a rank comparison, the posterior model probabilities are given by, 

Pr(r\Y)= BFr \° , (5.1) 
Z^=o^j|o 

with BF i defined as 1. 

In the calculation of BF r i , Sugita (2002) recommends an approach first introduced by 
Verdinelli and Wasserman (1995) for nested model structure Bayes factors, which results in 

Bp q = P(a' = Orxn) = Jp{a,(3,T,J:\Y)dad/3drdE 

r| ° Capiat? = O r xn\Y) C^ 1 fp(a,f3,T,i;\Y)\ Tank{a)=0 dadf3dTd'Z 
where the correction factor for the reduction in dimension C r is given by, 

C r = J p(a, (3, T, S) \ rank ( a)=0 d(3dTdE. (5.3) 
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We note that Sugita (2002) does not comment on numerical complications that can arise 
when implementing this estimator for the CVAR model. We detail in Appendix 1, Section [TU] 
steps that were critical to the calculation of the Bayes Factors when handling potential numerical 

g\ t+h 

overflows. The numerical issues arise as t increases, for example the term \S* \ 2 will explode 
numerically. This will result in incorrect numerical results for the Bayes Factors if not handled 
appropriately. 

5.2 Model Selection, Model Averaging and Prediction 

With samples from p((3, B, E, r\Y) one can consider either model selection or model averaging. 
In a survey of the literature on rank selection, the most common form of inference performed 
involves model selection. In this paper we note that model averaging should also be considered, 
especially when it is probable that given the realized data, two different ranks are highly probable 
according to their posterior model probabilities. We argue that by adopting the Bayesian model 
averaging framework one is able to reduce potential model risk associated with selection of the 
rank from several choices, which may all be fairly probable under the posterior. This in turn 
should reduce the associate model risk involved in the popular application of CVAR models in 
algorithmic trading strategies based on these co-integration frameworks and estimation of the 
rank. 

In this case one can use the samples from p((3, B, S|Y, r) in each model r to form a weighted 
model averaged estimate through the direct knowledge of the estimated model probabilities 
given by p(r\Y). There is discussion on model averaging in the CVAR context found in Koop 
et al. (2006). 

Bayesian Model Order Selection (BMOS) 

In BMOS we select the most probable model corresponding to the maximum a posteriori (MAP) 
estimate from p(r\Y), denoted tmap- Conditional on tmap, we then take the samples of 
{j3W , B W , S W }j=i : jif corresponding to Markov chain simulated for the ruAP model and we 
estimate point estimates for the parameters. 

These point estimates typically include posterior means or modes, though one should be 
careful. We note that it was demonstrated by Kleibergen and van Dijk (1994) or Bauwens and 
Lubrano (1996) that in many popular CVAR Bayesian models, certain choices of prior result 
in a proper posterior yet it may not have finite moments of any order. Some alternatives are 
proposed by Strachan and Inder (2004). 



14 



Bayesian Model Averaging (BMA) 

In this section we consider the problem of estimating for example an integral of a quantity or 
function of interest, <j)({j3, B, £}), with respect to the posterior distribution of the parameters, 
e.g. moments of the posterior. Since we have chosen to work with a posterior distribution 
p((3, B, S, r\Y) we can estimate this integral quantity whilst removing the model risk associated 
with rank uncertainty. This is achieved by approximating 

n „ n M 

/ 0({/3,B,5W)p(ft£>S|y,r)p(rTO (5.4) 

r=l •* r=l 3=1 

Prediction Incorporating Model Risk 

Here we perform prediction whilst removing model uncertainty related to the rank. This is 
possible under a Bayesian Model Averaging (BMA) framework using, 

n - 

p(Y*\Y) =J2 P{Y*\(3, B,H,r)p(J3, B,E\Y,r)p(r\Y)d/3dBdE. (5.5) 

r=l 

We will compare the predictive performance of the MMSE estimate or mean of the estimated 
distribution for p{Y*\Y) under the BMA versus BMOS approach which involves, 

p(Y*\Y) = J p{Y*\p,B,^,? MAP )p{(3,B,J:\Y,? MAP )d(3dBd^. (5.6) 
6 Simulation Experiments 

Analysis of the methodology developed is in three parts: the first part contains simulations per- 
formed on synthetic data sets, comparing performance of the proposed model sampling method- 
ology; the second part contains two real data set examples; and the third part involves analysis 
of predictive performance BMOS and BMA using real data. 

6.1 Synthetic Experiments 

In this section the intention will be to develop a controlled setting in which the true model 
parameters are known and the data is generated from the true model. This will allow us to 
assess performance of each of the proposed estimation procedures. In doing this we take an 
identical model to the simple model studied in Sugita (2002; 2009) [p. 4] for our analysis. 
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6.1.1 Analysis of samplers 

The first analysis is to compare the performance of the two adaptive samplers. To achieve this 
we generate 20 realizations of data sets of length T = 100 from the rank r = 2 model. Then 
conditional on knowledge of the rank r = 2 we sample J = 20, 000 samples from the joint 
posterior p (B, S, (3\Y, r = 2) and discard the first 10,000 samples as burnin. We perform this 
analysis for each of the data realizations under both of the proposed samplers, Algorithm 1 
and Algorithm 2, and then we present average MMSE estimates and average posterior standard 
deviations from each sampler in Table 1. In particular we present the averaged posterior point 
estimates for: the unrestricted (3 parameters; the average trace of the posterior estimate of the 
covariance S; the average of each of the intercept terms; and the averaged first element of the 
unrestricted a. 

Note, the pre-tuning of the local random walk proposal standard deviation for Algorithm 1 
is performed offline using an MCMC run of length 20,000. Additionally, the prior parameters 
were set to be: for the prior parameters were set as P = (w'w\ WY, A = A ^W'W^ /T 
with A = 1, W = (^XZP^j and (5 = [I r , 0]; for (3 the prior parameters were set as E[(3] = (I r , 0), 
Q = I n , H = tZ'Z and r = 1/T; for £ the prior parameters were set as S = tY'Y and h = n+1. 

These results demonstrate that both Algorithm 1 and Algorithm 2 perform well. The MMSE 
estimates produced by both algorithms are accurate compared to the true parameter values 
used to generate the data. Algorithm 1 which involved the mixture of pretuned local moves 
and a Global move centered on the Maximum Likelihood parameter estimates required more 
computational effort than the adaptive MCMC approach of Algorithm 2. Additionally, we point 
out that as discussed in Rosenthal (2008), the sampler we developed in Algorithm 2 actually 
achieves optimal performance as n — > oo. Therefore it will be a far superior algorithm to the 
griddy Gibbs sampler approach which will not be feasible in high dimensions. Hence, for an 
automated and computationally efficient alternative to the griddy Gibbs sampler typically used 
we would recommend the use of Algorithm 2. In the following studies, we utilize Algorithm 
2, the adaptive MCMC algorithm. To conclude, we also present the trace plots of the sample 
paths under the adaptive MCMC algorithm, see Figure 1. This plot demonstrates that rapid 
convergence of the MMSE estimates of the parameters in the posterior, even when initialized far 
from the true values. Additionally, one can see the behavior of the adaptive proposal, learning 
the appropriate proposal variance. 
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6.1.2 Analysis of Adaptive MCMC sampler in high dimension. 

In this example, we work with the Adaptive MCMC algorithm we developed for the Bayesian 
CVAR model. In particular we consider the case in which n = 10, which is a setting in which 
the standard approach of the griddy Gibbs sampler will become excessively computational, due 
to the curse of dimensionality, since there are now several hundred parameters to be sampled 
from the posterior. 

All coefficients except for the cointegrating vectors are generated by uniform distributions 
with a range between -0.4 to 0.4, and the error covariance was set to the identity. We generate a 
realizations of data of length T = 100 from the true rank r = 5 model in which the cointegration 
vector has all terms in the matrix of (3 which are unrestricted set to be 0, other than the last row, 
which is -1. Then conditional on knowledge of the rank r = 5 we sample J = 20,000 samples 
from the joint posterior p (B, S, (3\Y, r = 5) and discard the first 10,000 samples as burnin. 

The sample paths of the cointegration vector parameters randomly selected to be presented 
were /?io,i, /?io,4 which are shown in Figure 2. Clearly, again in this high dimensional setting 
(310 dimensions), the adaptive MCMC algorithm performs suitably Even, though the Markov 
chain is initialized far from the true parameter values of cointegration vector, we see the rapid 
convergence of our sampler. This is illustrated for the two arbitrarily selected parameters which 
had true values of of -1 and -1. Note, in this high dimensional setting, the algorithm was 
implemented in Matlab and took only 132sec to complete the simulation on an Intel Core 2 Duo 
at 2.40GHz, with 3.56Gb of RAM. 

6.1.3 Analysis of model selection in the Bayesian CVAR model 

In this section we study on synthetic data the performance of the Bayes Factor estimator applied 
to estimate posterior model probabilities for the rank. To perform this analysis we consider the 
model from Sugita, (2002; 2009 [p. 4]) and we take data series of length T = 100 and we simulate 
50 independent data realizations for each possible model rank r = 1, . . . ,4. Then for each rank 
r we count the number of times each model is selected as the MAP estimate out of the total 
of the 50 simulations, one simulation per generated data set. Note, the algorithm was run for 
20,000 iterations with 10,000 samples used as burnin. The results of this analysis are presented 
in Table 2. 

We note that the results of this section demonstrated the following interesting properties: 
1. When the true rank used to generate the observations data was small, the BF methodology 
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was clearly able to detect the true model order as the MAP estimate in a high proportion 
of the tested data sets. 

2. In all cases the averaged actual posterior model probabilities were very selective of the 
correct model, indicating that at least under this synthetic data scenario, there would 
not be great benefit in performing model averaging. However, we will demonstrate later 
examples with actual data in which there is significant ambiguity between possible model 
ranks, in these cases we also study the model averaging results. 

6.2 Financial Example 1 - US mini indexes 

Having assessed the proposed algorithms developed in this paper for synthetic data generated 
from a CVAR model, we now work with a practical financial example. In this example we will 
consider data series comprised of US indexes S&P mini, Nasdaq mini and Dow Jones mini. The 
data obtained for each of these data series consists of 774 values corresponding to the close 
of market daily price from the 31-Aug-2005 through to 30-Sep-2008. The time series data is 
presented in Figure 3. 

We analyze this data using Algorithm 2 (adaptive MCMC) and estimate the rank via Bayes 
Factor analysis, the results are presented in Table 3. We run 20 independent samplers with 
different initializations, for each possible rank. This is performed for each data set, and the 
total series is split into increasing subsets, each taking subsets of the data from 50 data points 
through to 400 data points, in increases of 50 data points. This allows us to study the change in 
the estimated rank as a function of time for each of these time series. Clearly, if the true rank of 
our model was fixed, then as the total amount of data we include increases, then we should see 
the posterior model probability of the rank converge to 1 for one of the possible ranks. What we 
observed after doing this analysis was that there was a clear variability in the predicted rank as 
we included more data. In particular the model estimates showed preference most often to rank 
1, suggesting that 2 common stochastic trends are present in the series. Additionally, the fact 
that in several cases, the model is less likely to distinguish between rank 1 and 2, suggests it may 
be prudent to also perform a model averaging analysis. Especially in the popular application of 
CVAR models in practice to perform algorithmic trading. 
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6.3 Financial Example 2 - US notes 

Here we repeat the same procedure performed in Financial Example 1, for a different data set. 
This time we consider data series comprised of Bond data for US 5 year, 10 year and 30 year 
notes over the same time period as the US mini index data. The time series data is presented 
in Figure 4. 

We analyze this data using Algorithm 2 (adaptive MCMC) and estimate the rank via Bayes 
Factor analysis, the results are presented in Table 4. We set up this second data analysis 
in the same way as Financial Exampl 1, with 20 independent samplers, each with different 
initializations, for each possible rank. This allows us to study the change in the estimated rank 
as a function of time for each of these time series. Again, we observed that with this data, the 
model gave preference most often to rank 1, suggesting that 2 common trends are present in the 
series we are analyzing. However, there was much stronger evidence for a single co-integrating 
relationship over time in this data, compared to the analysis of the US mini index data over the 
same period. This suggests that the US bond data series is a more stable series to fit the CVAR 
model too when assuming a constant number of co-integrating relationships over time. 

6.4 Financial Example 3 

In this section we perform a predictive performance comparison using Bayesian Model Selection 
versus Averaging. We take 2 series for the US bonds, 5 years and 10 years, and we combine 
these series over the same period with the S& P 500 mini index. We compare the MMSE 
estimate of the predicted series over 10 steps ahead which is obtained from the distribution of 
the predicted data p(Y*\Y), after we have integrated out parameter and rank uncertainties. 
We demonstrate that in this actual data example, the performance obtained by Bayesian Model 
Averaging represents the uncertainty in the prediction more accurately than the Bayesian Model 
Order Selection setting. 

This study is performed as follows. We begin by selecting randomly, with replacement, 100 
segments of the vector time series, each containing 50 days of data. For each segment of the 
time series we fit our Bayesian model for each possible rank, also estimating via Bayes Factors 
the posterior model probability for each rank. Then we calculate the predictive posterior mean, 
corresponding to the MMSE estimate of the predicted data series for the following 5 days, Y*. 
Finally, we take the squared difference between the actual data series over the proceeding 10 
days post the 50 days for the given segment and the posterior mean of the predicted data Y*. 
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In Figure 5 we present for each prediction day a boxplot of the squared difference between 
the actual data over the random sets of 5 days and the predictive posterior MMSE estimators 
for the same 5 days. We compare here the performance under Bayesian model selection and 
averaging. When performing Bayesian model averaging we are integrating out uncertainty in 
the prediction due to the prediction of the unknown rank. 

Clearly, the Bayesian model averaging approach will result in a greater uncertainty in the 
prediction when compared to the Bayesian model selection. This is reflected especially in the 
distribution of the prediction at 5 days where the model averaging approach box- whisker plot 
covers a noticeably wider range than the model selection equivalent. Though not presented here, 
we also assessed and confirmed this would occur out to longer predictions of 10 days and 20 
days. 

7 Conclusions 

We have developed and demonstrated how one can utilize state of the art adaptive MCMC 
methodology to solve a challenging high dimensional econometrics problem based on cointe- 
grated vector autoregressions. The challenging application involved a posterior distribution 
which was matrix-variate and very high dimensional. We compared the performance of the 
Adaptive Metropolis algorithm with an alternative based on a mixture proposal of local and 
global moves centered on the the Maximum Likelihood parameters. We then formulated the 
rank estimation in as a Bayesian model selection problem and performed analysis of the Bayes 
factors using our adaptive MCMC algorithm. We concluded with analysis of real market data 
and performed Bayesian model selection and model averaging, with respect to the unknown 
rank. In conclusion, the adaptive MCMC methodology developed clearly allowed us to extend 
significantly the dimension of the estimation problem in the Bayesian CVAR literature. It was 
shown to be highly efficient and accurate. 

From the perspective of developing a Bayesian CVAR model for algorithmic trading we 
found that historically the US bond data we considered is a more stable series to fit the CVAR 
model too when assuming a constant number of co-integrating relationships over time. This 
will therefore impact the stability of trading performance under such models. In addition when 
considering trading triples made up of the US bond data series and the S&P mini index, it is 
beneficial to perform Bayesian model averaging for the rank, rather than just selecting the most 
probable co-integration rank. The adaptive MCMC based framework allows this to be done 
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efficiently and in an automated fashion. 
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10 Appendix 1 



We begin by calculating the log posterior model probabilities, 

log (Pr (r\Y)) = log (flF r | ) + log {BF max{0 ) - log (x> x P ( lo S ( BF i\o) ~ lo S [ BF max |o)) j , 

(10.1) 

where BF ma2 .| = max{BF ^, BF n \ }. Additionally, we now consider the log of the Bayes 
Factor for rank r and we apply the same numerical trick. 

log {BF T \ ) = log (p{a! = rxn )) + log (C r ) - log (p{o! = rxn |Y)) (10.2) 

Now, considering each of the terms: 

. log(p(a' = rxn )) = -f log(Tr) + |log(|5|) + flog {\A 2%1 \) + £? =1 log ( ^ffl"^ 



**=log(|S|) 



2 



log (p(a' = rxn |Y)) = -log (AT)+log (L^L) -log (exp log - log (i 

1 I 2 

r( t+h + 1 -' ) 



(i) 



where L« = ^^1^,1*112=1 ^Sffl^+^f A% 1 jiff |- ^ and 



max{L^\ Z^}. 



(2) 

max III) 



log (CV) = -log(iV) + log (l$L) log (exp (Ztl log (^ 2) ) " log (l 
where = ^~?^7^^ ~ an d imL = max{Lp\ L^}. Note this sum evaluated 
using samples from the Markov chain run in model r where, p(a = 0, r®|E®) and 
p(rw|£W) are obtained using knowledge of the specified prior, p(B\T,) = p(T,a\T,) = 
p(//,tf 1:p _i,a|£). 
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Parameter Estimates 


Algorithm 1 


Algorithm 2 


Truth 


Ave. MMSE @ 1>r+1 


-0.002 (0.001) 


-0.034 (0.002) 





Ave. Posterior Stdev. 


0.018 (0.006) 


0.010 (0.003) 


- 


Ave. MMSE p 2 ,r+i 


-0.819 (0.051) 


-0.862 (0.045) 


-1 


Ave. Posterior Stdev. /3 2 ,r+i 


0.032 (0.005) 


0.020 (0.003) 


- 


Ave. MMSE /3 1>n 


0.033 (0.025) 


-0.024 (0.023) 





Ave. Posterior Stdev. /?i in 


0.030 (0.012) 


0.026 (0.010) 


- 


Ave. MMSE /3 2 ,„ 


-0.752 (0.098) 


-0.774 (0.082) 


-1 


Ave. Posterior Stdev. /3 2 , n 


0.038 (0.013) 


0.028 (0.006) 


- 


Ave. Mean acceptance probability (3 


0.352 (0.010) 


0.232 (0.029) 


- 


Ave. MMSE tr (£) 


4.945 (0.331) 


4.432 (0.332) 


4 


Ave. Posterior Stdev. tr (£) 


0.420 (0.049) 


0.416 (0.048) 


- 


Ave. MMSE Hi 


0.07 (0.051) 


0.065 (0.043) 


0.1 


Ave. Posterior Stdev. /ii 


0.236 (0.028) 


0.226 (0.026) 


- 


Ave. MMSE ^ 2 


-0.027 (0.041) 


-0.034 (0.024) 


0.1 


Ave. Posterior Stdev. [i 2 


0.183 (0.041) 


0.181 (0.010) 


- 


Ave. MMSE ^ 


-0.080 (0.084) 


-0.061 (0.045) 


0.1 


Ave. Posterior Stdev. 


0.199 (0.020) 


0.187 (0.015) 




Ave. MMSE ^ 


0.024 (0.049) 


0.030 (0.029) 


0.1 


Ave. Posterior Stdev. /14 


0.184 (0.010) 


0.185 (0.011) 




Ave. MMSE ai,i 


-0.223 (0.015) 


-0.224 (0.016) 


-0.2 


Ave. Posterior Stdev. a± t i 


0.070 (0.006) 


0.068 (0.005) 




Ave. MMSE ai, 2 


0.201 (0.013) 


0.202 (0.013) 


0.2 


Ave. Posterior Stdev. ai j2 


0.053 (0.002) 


0.052 (0.002) 





Table 1: Sampler Analysis - Algorithm 1 is the pretuned mixture proposal of Global ML move 
and local pretuned MCMC move; Algorithm 2 is the Global adaptively learnt MCMC proposal. 
Averages and a standard error are taken for the Bayesian point estimators over 20 data sets, the 
standard errors are presented in brackets (•). Note in all simulations the initial Markov chain is 
started very far away from the true parameter values. 
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Model Rank 


Bayes Factors 


r = 


3 (0.84) 


r = 1 


16 (0.93) 


r = 2 


2 (0.92) 


r = 3 


(-) 


r = 4 


(-) 


r = 


O(-) 


r = 1 


5 (0.89) 


r = 2 


13 (0.91) 


r = 3 


(-) 


r = 4 


2 (0.92) 


r = 


O(-) 


r = 1 


O(-) 


r = 2 


4 (0.89) 


r = 3 


6 (0.90) 


r = 4 


10 (0.94) 


r = 


0(-) 


r = 1 


o(-) 


r = 2 


O(-) 


r = 3 


2 (0.87) 


r = 4 


18 (0.89) 



Table 2: Between Model Analysis - The true model rank used to generate the data is 
presented in bold. TDMCMC is the Trans-dimensional Markov chain Monte Carlo algorithm 
utilizing adaptive MH within model moves and the global Independent between model moves. 
The results represent the total number of times a given rank is selected as the MAP estimate 
out of the 20 independent data sets, each of length T=100, analyzed. Additionally, the average 
posterior model probability for these cases is presented in brackets. 
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Rank \ T 


50 


100 


150 


200 


250 


300 


350 


400 


r = 


























r = 1 


8.09 (0.78) 


3.77 (0.29) 


7.01 (0.50) 


11.51 (0.90) 


1.69 (0.54) 


2.71 (3.55) 


3.14 (1.11) 


7.77 (0.83) 


r = 2 


2.91 (1.24) 


2.33 (1.26) 


4.61 (0.63) 


25.36 (7.19) 


-5.33 (1.17) 


-5.80 (0.97) 


4.92 (1.06) 


-3.88 (1.11) 


r = 3 


-26.03 (1.06) 


-8.45 (0.27) 


-37.25 (1.08) 


-55.79 (1.70) 


-14.61 (0.03) 


-62.60 (3.31) 


8.88 (0.88) xl0~ 3 


-2.06 (2.48 x 10~ 2 ) 



Table 3: Log Bayes Factors: Analysis of VAR series of US mini indexes as a function of data size. Average log Bayes Factors and standard 
deviation of log Bayes Factors over 20 independent Markov chains each of chain length 20,000. 



Rank \ T 


50 


100 


150 


200 


250 


300 


350 


400 


r = 


























r = 1 


4.81 (1.00) 


3.14 (0.43) 


5.36 (1.06) 


5.92 (0.84) 


3.32 (0.75) 


1.30 (0.37) 


7.30 (0.59) 


3.10 (0.48) 


r = 2 


-1.67 (12.78) 


3.66 (3.87) 


-3.75 (3.04) 


-1.83 (2.61) 


-6.02 (3.22) 


0.14 (6.51) 


-2.93 (1.96) 


-7.73 (2.46) 


r = 3 


-42.44 (12.38) 


-48.58 (2.85) 


-33.12 (0.14) 


-100.42 (4.82) 


-25.91(6.52 x 10~ 2 ) 


-10.33 (0.72) 


-142.89 (3.31) 


-195.47 (4.71) 



Table 4: Log Bayes Factors: Analysis of VAR series of US Bonds (5,10,30 Year Notes) as a function of data size. Average log Bayes 
Factors and standard deviation over 20 independent Markov chains each of chain length 20,000. 



Markov Chain for P ] Markov Chain for |3, 




Figure 1: Sample paths for posterior parameters, using 100 data points, true rank of 
known and an adaptive MCMC algorithm. 
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Figure 2: Sample paths for posterior parameters, using 100 data points, true rank of 
known and an adaptive MCMC algorithm. 
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Figure 3: S&P 500, Dow Jones and Nasdaq mini Index daily close price data between Ol-May-08 
to 18-Sep-08. Left column plots represent scaled raw prices; Right plots represent difference data 



series. 
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Figure 4: 5, 10, 30 Year Notes - daily close price data between Ol-May-08 to 18-Sep-08. Left 
column plots represent scaled raw prices; Right plots represent difference data series. 
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Figure 5: Empirical distribution of the Bayesian Model Averaging and Bayesian Model Order 
Selection, predictive performance for a combination of mini-index and bond data, taken over 
random intervals. 
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